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Curvelets:  A Surprisingly  Effective  Nonadaptive 
Representation  for  Objects  with  Edges 


Emmanuel  J.  Candes  and  David  L.  Donoho 


Abstract.  It  is  widely  believed  that  to  efficiently  represent  an  otherwise 
smooth  object  with  discontinuities  along  edges,  one  must  use  an  adaptive 
representation  that  in  some  sense  ‘tracks’  the  shape  of  the  discontinuity 
set.  This  folk-belief  — some  would  say  folk-theorem  — is  incorrect.  At 
the  very  least,  the  possible  quantitative  advantage  of  such  adaptation  is 
vastly  smaller  than  commonly  believed.  We  have  recently  constructed  a 
tight  frame  of  curvelets  which  provides  stable,  efficient,  and  near-optimal 
representation  of  otherwise  smooth  objects  having  discontinuities  along 
smooth  curves.  By  applying  naive  thresholding  to  the  curvelet  transform 
of  such  an  object,  one  can  form  m- term  approximations  with  rate  of  L2 
approximation  rivaling  the  rate  obtainable  by  complex  adaptive  schemes 
which  attempt  to  ‘track’  the  discontinuity  set.  In  this  article  we  explain 
the  basic  issues  of  efficient  m-term  approximation,  the  construction  of 
efficient  adaptive  representation,  the  construction  of  the  curvelet  frame, 
and  a crude  analysis  of  the  performance  of  curvelet  schemes. 


§1.  Introduction 

In  many  important  imaging  applications,  images  exhibit  edges  - discontinu- 
ities across  curves.  In  traditional  photographic  imaging,  for  example,  this 
occurs  whenever  one  object  occludes  another,  causing  the  luminance  to  un- 
dergo step  discontinuities  at  boundaries.  In  biological  imagery,  this  occurs 
whenever  two  different  organs  or  tissue  structures  meet. 

In  image  synthesis  applications,  such  as  CAD,  there  is  no  problem  in  deal- 
ing with  such  discontinuities,  because  one  knows  where  they  are  and  builds  the 
discontinuities  into  the  representation  by  specially  adapting  the  representation 
— for  example,  inserting  free  knots,  or  adaptive  refinement  rules. 

In  image  analysis  applications,  the  situation  is  different.  When  working 
with  real  rather  than  synthetic  data,  one  of  course  doesn’t  ‘know’  where  these 
edges  are;  one  only  has  a digitized  pixel  array,  with  potential  imperfections 
caused  by  noise,  by  blurring,  and  of  course  by  the  unnatural  pixelization 
of  the  underlying  continuous  scene.  Hence  the  typical  image  analyst  only 
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has  recourse  to  representations  which  don’t  ‘know’  about  the  existence  and 
geometry  of  the  discontinuities  in  the  image. 

The  success  of  discontinuity-adapting  methods  in  CAD  and  related  image 
synthesis  fields  creates  a temptation  for  an  image  analyst  - a temptation  to 
spend  a great  deal  of  time  and  effort  importing  such  ideas  into  image  analysis. 
Almost  everyone  we  know  has  yielded  to  this  temptation  in  some  form,  which 
creates  a possibility  for  surprise. 

Oracles  and  ideally-adapted  representation 

One  could  imagine  an  ideally-privileged  image  analyst  who  has  recourse  to 
an  oracle  able  to  reveal  the  positions  of  all  the  discontinuities  underlying  the 
image  formation.  It  seems  natural  that  this  ideally-privileged  analyst  could 
do  far  better  than  the  normally-endowed  analyst  who  knows  nothing  about 
the  position  of  the  discontinuities  in  the  image. 

To  elaborate  this  distinction,  we  introduce  terminology  borrowed  from 
fluid  dynamics,  where  ‘edges’  arise  in  the  form  of  fronts  or  shock  fronts. 

A Lagrangian  representation  is  constructed  using  full  knowledge  of  the  intrinsic 
structure  of  the  object  and  adapting  perfectly  to  that  structure. 

• In  fluid  dynamics  this  means  that  the  fluid  flow  pattern  is  known,  and 
one  constructs  a coordinate  system  which  ‘flows  along  with  the  particles’, 
with  coordinates  mimicking  the  shape  of  the  flow  streamlines. 

• In  image  representation  this  could  mean  that  the  edge  curves  are  known, 
and  one  constructs  an  image  representation  adapted  to  the  structure  of 
the  edge  curves.  For  example,  one  might  construct  a basis  with  disconti- 
nuities exactly  where  the  underlying  object  has  discontinuities. 

An  Eulerian  representation  is  fixed,  constructed  once  and  for  all.  It  is  non- 
adaptive  - having  nothing  to  do  with  the  known  or  hypothesized  details  of 
the  underlying  object. 

• In  fluid  dynamics,  this  would  mean  a usual  euclidean  coordinate  system, 
one  that  does  not  depend  in  any  way  on  the  fluid  motion. 

• In  image  representation,  this  could  mean  that  the  representation  is  some 
fixed  coordinate  representation,  such  as  wavelets  or  sinusoids,  which  does 
not  change  depending  on  the  positions  of  edges  in  the  image. 

It  is  quite  natural  to  suppose  that  the  Lagrangian  perspective,  when  it  is 
available,  is  much  more  powerful  that  the  Eulerian  one.  Having  the  privilege  of 
‘inside  information’  about  the  position  of  important  geometric  characteristics 
of  the  solution  seems  a priori  rather  valuable.  In  fact,  this  position  has 
rather  a large  following.  Much  recent  work  in  computational  harmonic  analysis 
(CHA)  attempts  to  find  bases  which  are  optimally  adapted  to  the  specific 
object  in  question  [7,10,11];  in  this  sense  much  of  the  ongoing  work  in  CHA 
is  based  on  the  presumption  that  the  Lagrangian  viewpoint  is  best. 

In  the  setting  of  edges  in  images,  there  has,  in  fact,  been  considerable 
interest  in  the  problem  of  developing  representations  which  are  adapted  to 
the  structure  of  discontinuities  in  the  object  being  studied.  The  (equivalent) 
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concepts  of  probing  and  minimum  entropy  segmentation  are  old  examples  of  this: 
wavelet  systems  which  are  specifically  constructed  to  allow  discontinuities  in 
the  basis  elements  at  specific  locations  [8,9].  More  recently,  we  are  aware 
of  much  informal  unpublished  or  preliminary  work  attempting  to  build  2D 
edge-adapted  schemes;  we  give  two  examples. 

• Adaptive  triangulation  aims  to  represent  a smooth  function  by  partition- 
ing the  plane  into  a sequence  of  triangular  meshes,  refining  the  meshes 
at  one  stage  to  create  finer  meshes  at  the  next  stage.  One  represents  the 
underlying  object  using  piecewise  linear  functions  supported  on  individ- 
ual triangles.  It  is  easy  to  see  how,  in  an  image  synthesis  setting,  one 
can  in  principle  develop  a triangulation  where  the  triangles  are  arranged 
to  track  a discontinuity  very  faithfully,  with  the  bulk  of  refinement  steps 
allocated  to  refinements  near  the  discontinuity,  and  one  obtains  very  ef- 
fective representation  of  the  object.  It  is  not  easy  to  see  how  to  do  this 
in  an  image  analysis  setting,  but  one  can  easily  be  persuaded  that  the 
development  of  adaptive  triangulation  schemes  for  noisy,  blurred  data  is 
an  important  and  interesting  project. 

• In  an  adaptively  warped  wavelet  representation,  one  deforms  the  under- 

lying image  so  that  the  object  being  analyzed  has  all  its  discontinuities 
aligned  purely  horizontal  or  vertical.  Then  one  analyzes  the  warped  ob- 
ject in  a basis  of  tensor-product  wavelets  where  elements  take  the  form 
ipj,k(z  1)  • This  is  very  effective  for  objects  which  are  smooth 

apart  from  purely  horizontal  and  purely  vertical  discontinuities.  Hence, 
the  warping  deforms  the  singularities  to  render  the  the  tensor  product 
scheme  very  effective.  It  is  again  not  easy  to  see  how  adaptive  warping 
could  work  in  an  image  analysis  setting,  but  one  is  easily  persuaded  that 
development  of  adaptively  warped  representations  for  noisy,  blurred  data 
is  an  important  and  interesting  project. 

Activity  to  build  such  adaptive  representations  is  based  on  an  article  of  faith: 
namely,  that  Eulerian  approaches  are  inferior,  that  oracle-driven  Lagrangian 
approaches  are  ideal,  and  that  one  should,  in  an  image  analysis  setting,  mimic 
Lagrangian  approaches,  attempting  empirically  to  estimate  from  noisy,  blurred 
data  the  information  that  an  oracle  would  supply,  and  build  an  adaptive  rep- 
resentation based  on  that  information. 

Quantifying  rates  of  approximation 

In  order  to  get  away  from  articles  of  faith,  we  now  quantify  performance,  using 
an  asymptotic  viewpoint. 

Suppose  we  have  an  object  supported  in  [0,  l]2  which  has  a discontinuity 
across  a nice  curve  F,  and  which  is  otherwise  smooth.  Then  using  a standard 
Fourier  representation,  and  approximating  with  built  from  the  best  m 
nonzero  Fourier  terms,  we  have 


||/-/£||2xm-1/2, 


m 


oo. 


(i) 
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This  rather  slow  rate  of  approximation  is  improved  upon  by  wavelets.  The 
approximant  built  from  the  best  m nonzero  wavelet  terms  satisfies 

Wf-fm  Hi-™-1.  m->00.  (2) 

This  is  better  than  the  rate  of  Fourier  approximation,  and,  until  now,  is  the 
best  published  rate  for  a fixed  non-adaptive  method  (i.e.  best  published  result 
for  an  ‘Eulerian  viewpoint’). 

On  the  other  hand,  we  will  discuss  below  a method  which  is  adapted  to 
the  object  at  hand,  and  which  achieves  a much  better  approximation  rate 
than  previously  known  ‘nonadaptive’  or  ‘Eulerian’  approaches.  This  adaptive 
method  selects  terms  from  an  overcomplete  dictionary  and  is  able  to  achieve 

\\f  ~ fmWl-rn-2,  m — > oo.  (3) 

Roughly  speaking,  the  terms  in  this  dictionary  amount  to  triangular  wedges, 
ideally  fitted  to  approximate  the  shape  of  the  discontinuity. 

Owing  to  the  apparent  trend  indicated  by  (l)-(3)  and  the  prevalence  of 
the  puritanical  belief  that  ‘you  can’t  get  something  for  nothing’,  one  might 
suppose  that  inevitably  would  follow  the 

Folk-Conjecture/ [Folk-Theorem].  The  result  (3)  for  adaptive  representa- 
tions far  exceeds  the  rate  of  m-term  approximation  achievable  by  fixed  non- 
adaptive representations. 

This  conjecture  appeals  to  a number  of  widespread  beliefs: 

• the  belief  that  adaptation  is  very  powerful, 

• the  belief  that  the  way  to  represent  discontinuities  in  image  analysis  is  to 
mimic  the  approach  in  image  synthesis, 

• the  belief  that  wavelets  give  the  best  fixed  nonadaptive  representation. 
In  private  discussions  with  many  respected  researchers  we  have  many 

times  heard  expressed  views  equivalent  to  the  purported  Folk-Theorem. 

The  surprise 

It  turns  out  that  performance  almost  equivalent  to  (3)  can  be  achieved  by  a 
nonadaptive  scheme.  In  other  words,  the  Folk-Theorem  is  effectively  false. 

There  is  a tight  frame,  fixed  once  and  for  all  nonadaptively,  which  we  call 
a frame  of  curvelets,  which  competes  surprisingly  well  with  the  ideal  adaptive 
rate  (3).  A very  simple  m- term  approximation  - summing  the  m biggest  terms 
in  the  curvelet  frame  expansion  - can  achieve 

11/ -/mill  < C-m~2 (log m)3,  m —*  oo,  (4) 

which  is  nearly  as  good  as  (3)  as  regards  asymptotic  order.  In  short,  in  a 
problem  of  considerable  applied  relevance,  where  one  would  have  thought  that 
adaptive  representation  was  essentially  more  powerful  than  fixed  nonadaptive 
representation,  it  turns  out  that  a new  fixed  nonadaptive  representation  is  es- 
sentially as  good  as  adaptive  representation,  from  the  point  of  view  of  asymp- 
totic m-term  approximation  errors.  As  one  might  expect,  the  new  nonadaptive 
representation  has  several  very  subtle  and  distinctive  features. 
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In  this  article,  we  would  like  to  give  the  reader  an  idea  of  why  (3)  represents 
the  ideal  behavior  of  an  adaptive  representation,  of  how  the  curvelet  frame  is 
constructed,  and  of  the  key  elements  responsible  for  (4).  We  will  also  attempt 
to  indicate  why  curvelets  perform  for  singularities  along  curves  the  task  that 
wavelets  perform  for  singularities  at  points. 

§2.  A Precedent:  Wavelets  and  Point  Singularities 

We  mention  an  important  precedent  - a case  where  a nonadaptive  scheme  is 
roughly  competitive  with  an  ideal  adaptive  scheme.  Suppose  we  have  a piece- 
wise  polynomial  function  / on  the  interval  [0, 1],  with  jump  discontinuities  at 
several  points. 

An  obvious  adaptive  representation  is  to  fit  a piecewise  polynomial  with 
breakpoints  at  the  discontinuities.  If  there  are  P pieces  and  each  polynomial 
is  of  degree  < D,  then  we  need  only  keep  P ■ (D  + 1)  coefficients  and  P — 1 
breakpoints  to  exactly  represent  this  function.  Common  sense  tells  us  that 
this  is  the  natural,  and  even,  the  ideal  representation  for  such  a function. 

To  build  this  representation,  we  need  to  know  locations  of  the  discontinu- 
ities. If  the  measurements  are  noisy  or  blurred,  and  if  we  don’t  have  recourse 
to  an  oracle,  then  we  can’t  necessarily  build  this  representation. 

A less  obvious  but  much  more  robust  representation  is  to  take  a nice 
wavelet  transform  of  the  object,  and  keep  the  few  resulting  nonzero  wavelet 
coefficients.  If  we  have  an  A-point  digital  signal  f(i/N),  1 < i < N,  and 
we  use  Daubechies  wavelets  of  compact  support,  then  there  are  no  more  than 
C ■ log 2 (A)  • P ■ (D  + 1)  nonzero  wavelet  coefficients  for  the  digital  signal. 

In  short,  the  nonadaptive  representation  needs  only  to  keep  a factor 
C log2(A)  more  data  to  give  an  equally  faithful  representation. 

We  claim  that  this  phenomenon  is  at  least  partially  responsible  for  the 
widespread  success  of  wavelet  methods  in  data  compression  settings.  One  can 
build  a single  fast  transform  and  deal  with  a wide  range  of  different  /,  with 
different  discontinuity  sets,  without  recourse  to  an  oracle. 

In  particular,  since  one  almost  never  has  access  to  an  oracle,  the  nat- 
ural first  impulse  of  one  committed  to  the  adaptive  viewpoint  would  be  to 
‘estimate’  the  break  points  - i.e.  to  perform  some  sort  of  edge  detection.  Un- 
fortunately this  is  problematic  when  one  is  dealing  with  noisy  blurred  data. 
Edge  detection  is  a whole  topic  in  itself  which  has  thousands  of  proposed  so- 
lutions and  (evidently,  as  one  can  see  from  the  continuing  rate  of  publication 
in  this  area)  no  convincing  solution. 

In  using  wavelets,  one  does  not  need  edge  detectors  or  any  other  prob- 
lematic schemes,  one  simply  extracts  the  big  coefficients  from  the  transform 
domain,  and  records  their  values  and  positions  in  an  organized  fashion. 

We  can  lend  a useful  perspective  to  this  phenomenon  by  noticing  that  the 
discontinuities  in  the  underlying  / are  point  singularities , and  we  are  saying 
that  wavelets  need  in  some  sense  at  most  log(n)  coefficients  to  represent  a 
point  singularity  out  to  scale  1/n. 
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It  turns  out  that  even  in  higher  dimensions,  wavelets  have  a near-ideal 
ability  to  represent  objects  with  point  singularities. 

The  two-dimensional  object  fp(xi,X2)  — l/((xi  — 1 /2)2  + (z2  - 1 /2)2)/3 
has,  for  (3  < 1/2,  a square-integrable  singularity  at  the  point  (1/2, 1/2)  and  is 
otherwise  smooth.  At  each  level  of  the  2D  wavelet  pyramid,  there  are  effec- 
tively only  a few  wavelets  which  ‘feel’  the  point  singularity,  other  coefficients 
being  effectively  negligible.  In  approximation  out  to  scale  1/n,  only  about 
0(log(n))  coefficients  are  required. 

Another  approach  to  understanding  the  representation  of  singularities, 
which  is  not  limited  by  scale,  is  to  consider  rates  of  decay  of  the  countable 
coefficient  sequence.  Analysis  of  wavelet  coefficients  of  fp  shows  that  for  any 
desired  rate  p,  the  A-th  largest  coefficient  can  be  bounded  by  CpN~p  for  all 
A.  In  short,  the  wavelet  coefficients  of  such  an  object  are  very  sparse. 

Thus  we  have  a slogan:  wavelets  perform  very  well  for  objects  with  point 
singularities  in  dimensions  1 and  2. 

§3.  Failure  of  Wavelets  on  Edges 

We  now  briefly  sketch  why  wavelets,  which  worked  surprisingly  well  in  repre- 
senting point  discontinuities  in  dimension  1,  are  less  successful  dealing  with 
‘edge’  discontinuities  in  dimension  2. 

Suppose  we  have  an  object  / on  the  square  [0,  l]2  and  that  / is  smooth 
away  from  a discontinuity  along  a C 2 curve  T.  Let’s  look  at  the  number  of 
substantial  wavelet  coefficients. 

A grid  of  squares  of  side  2-7  by  2~3  has  order  2J  squares  intersecting  T. 
At  level  j of  the  two-dimensional  wavelet  pyramid,  each  wavelet  is  localized 
near  a corresponding  square  of  side  2~7  by  2~J.  There  are  therefore  0(2J) 
wavelets  which  ‘feel’  the  discontinuity  along  T.  Such  a wavelet  coefficient  is 
controlled  by 


< ll/lloo  - ||V'J-,fc„fc3||i  <C-2-J; 

and  in  effect  no  better  control  is  available,  since  the  object  / is  not  smooth 
within  the  support  of  ifj,k1,k2  [14].  Therefore  there  are  about  23  coefficients  of 
size  about  2~3 . In  short,  the  A-th  largest  wavelet  coefficient  is  of  size  about 
1/A.  The  result  (2)  follows. 

We  can  summarize  this  by  saying  that  in  dimension  2,  discontinuities 
across  edges  are  spatially  distributed;  because  of  this  they  can  interact  rather 
extensively  with  many  terms  in  the  wavelet  expansion,  and  so  the  wavelet 
representation  is  not  sparse. 

In  short,  wavelets  do  well  for  point  singularities,  and  not  for  singularities 
along  curves.  The  success  of  wavelets  in  dimension  1 derived  from  the  fact 
that  all  singularities  in  dimension  1 are  point  singularities,  so  wavelets  have 
a certain  universality  there.  In  higher  dimensions  there  are  more  types  of 
singularities,  and  wavelets  lose  their  universality. 

For  balance,  we  need  to  say  that  wavelets  do  outperform  classical  meth- 
ods. If  we  used  sinusoids  to  represent  an  object  of  the  above  type,  then  we 
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have  the  result  (1),  which  is  far  worse  than  that  provided  by  wavelets.  For 
completeness,  we  sketch  the  argument.  Suppose  we  use  for  ‘sinusoids’  the 
complex  exponentials  on  [—n,n]2,  and  that  the  object  / tends  smoothly  to 
zero  at  the  boundary  of  the  square  [0,  l]2,  so  that  we  may  naturally  extend 
it  to  a function  living  on  [ — 7r,  zr]2.  Now  typically  the  Fourier  coefficients  of 
an  otherwise  smooth  object  with  a discontinuity  along  a curve  decay  with 
wavenumber  as  |fc|~3/2  (the  very  well-known  example  is  / = indicator  of  a 
disk,  which  has  a Fourier  transform  described  by  Bessel  functions).  Thus 
there  are  about  R 2 coefficients  of  size  > c ■ f?-3/2,  meaning  that  the  jV-th 
largest  is  of  size  > c ■ 1V~3/4,  from  which  (1)  follows. 

In  short:  neither  wavelets  nor  sinusoids  really  sparsify  two-dimensional 
objects  with  edges  (although  wavelets  are  better  than  sinusoids). 

§4.  Ideal  Representation  of  Objects  with  Edges 

We  now  consider  the  optimality  result  (3),  which  is  really  two  assertions.  On 
the  one  hand,  no  reasonable  scheme  can  do  better  than  this  rate.  On  the 
other  hand,  there  is  a certain  adaptive  scheme,  with  intimate  connections  to 
adaptive  triangulation,  which  achieves  it.  For  more  extensive  discussion  see 
[10,11,13]. 

In  talking  about  adaptive  representations,  we  need  to  define  terms  care- 
fully, for  the  following  reason.  For  any  /,  there  is  always  an  adaptive  repre- 
sentation of  / that  does  very  well:  namely  the  orthobasis  'k  = {ipo,  ipi, . . .} 
with  first  element  ipo  = //II/II2!  This  is,  in  a certain  conception,  an  ‘ideal 
representation’  where  each  object  requires  only  one  nonzero  coefficient.  In  a 
certain  sense  it  is  a useless  one,  since  all  information  about  / has  been  hidden 
in  the  definition  of  representation,  so  actually  we  haven’t  learned  anything. 
Most  of  our  work  in  this  section  is  in  setting  up  a notion  of  adaptation  that 
will  free  us  from  fear  of  being  trapped  at  this  level  of  triviality. 

Dictionaries  of  atoms 

Suppose  we  are  interested  in  approximating  a function  in  L2(T),  and  we  have  a 
countable  collection  V = {cf>}  of  atoms  in  L2(T);  this  could  be  a basis,  a frame, 
a finite  concatenation  of  bases  or  frames,  or  something  even  less  structured. 

We  consider  the  problem  of  m-term  approximation  from  this  dictionary, 
where  we  are  allowed  to  select  m terms  (j>  1 , . . . , 4>m  from  V and  we  approximate 
/ from  the  L2-closest  member  of  the  subspace  they  span: 


fm  =Proj{f  |span(0i,...,0m)}. 


We  are  interested  in  the  behavior  of  the  m-term  approximation  error 


em(f;V)  = \\f-fm\\l 


where  in  this  provisional  definition,  we  assume  fm  is  a best  approximation  of 
this  form  after  optimizing  over  the  selection  of  m terms  from  the  dictionary. 
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However,  to  avoid  a trivial  result,  we  impose  regularity  on  the  selection 
process.  Indeed,  we  allow  rather  arbitrary  dictionaries,  including  ones  which 
enumerate  a dense  subset  of  L2(T),  so  that  in  some  sense  the  trivial  result 
01  = //||/||2  em  = 0,  Vm  is  always  a lurking  possibility.  To  avoid  this 
possibility  we  forbid  arbitrary  selection  rules.  Following  [10]  we  propose 

Definition.  A sequence  of  selection  rules  (orm(-))  choosing  m terms  from  a 
dictionary  T>, 

&m{f)  = (01>  ***)  0m)j 

is  said  to  implement  polynomial  depth  search  if  there  is  a single  fixed  enumer- 
ation of  the  dictionary  elements  and  a fixed  polynomial  7 r(t)  such  that  terms 
in  om(f ) come  from  the  first  ir(m)  elements  in  the  dictionary. 

Under  this  definition,  the  trivial  representation  based  on  a countable 
dense  dictionary  is  not  generally  available,  since  in  any  fixed  enumeration, 
a decent  1-term  approximation  to  typical  / will  typically  be  so  deep  in  the 
enumeration  as  to  be  unavailable  for  polynomial-depth  selection.  (Of  course, 
one  can  make  this  statement  quantitative,  using  information-theoretic  ideas). 

More  fundamentally,  our  definition  not  only  forbids  trivialities,  but  it 
allows  us  to  speak  of  optimal  dictionaries  and  get  meaningful  results.  Starting 
now,  we  think  of  dictionaries  as  ordered , having  a first  element,  second  element, 
etc.,  so  that  different  enumerations  of  the  same  collection  of  functions  are 
different  dictionaries.  We  define  the  m-optimal  approximation  number  for 
dictionary  V and  limit  polynomial  7r  as 

em(/;  £>;*■)  = 11/  - /mill. 

where  fm  is  constructed  by  optimizing  the  choice  of  m atoms  among  the  first 
7r(m)  in  the  fixed  enumeration.  Note  that  we  use  squared  error  for  comparison 
with  (l)-(3)  in  the  Introduction. 

Approximating  classes  of  functions 

Suppose  we  now  have  a class  T of  functions  whose  members  we  wish  to  ap- 
proximate. Suppose  we  are  given  a countable  dictionary  V and  polynomial 
depth  search  delimited  by  polynomial  7r(-). 

Define  the  error  of  approximation  by  this  dictionary  over  this  class  by 

em(F',V,n)  = rnaxem(/;P,7r). 

We  may  find,  in  certain  examples,  that  we  can  establish  bounds 
em(fF-,V,ir)  = 0(m~p),  m — > 00, 

for  all  p < p*.  At  the  same  time,  we  may  have  available  an  argument  showing 
that  for  every  dictionary  and  every  polynomial  depth  search  rule  delimited  by 
*■(•)> 

em(fF-  V,  7r)  > cm  p , m > mo(7r). 

Then  it  seems  natural  to  say  that  p*  is  the  optimal  rate  of  m-term  approxi- 
mation from  any  dictionary  when  polynomial  depth  search  delimited  by  7r(  ). 
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Starshaped  objects  with  C2  boundaries 

We  define  Star-Set2(C),  a class  of  star-shaped  sets  with  C2-smooth  bound- 
aries, by  imposing  regularity  on  the  boundaries  using  a kind  of  polar  coor- 
dinate system.  Let  p(9)  : [0,  27t)  — > [0, 1]  be  a radius  function  and  b0  = 
(xi.o,  £2,0)  be  an  origin  with  respect  to  which  the  set  of  interest  is  star-shaped. 
With  6i(x)  = X{  — a:*, 0,  i = 1,2,  define  functions  9(x  1,2:2)  and  r(x  1,2:2)  by 

9 = arctan(-(52/£i);  r = ((<5i)2  + (62)2)1/2- 

For  a starshaped  set,  we  have  (£1,2:2)  € B iff  0 < r < p(9).  Define  the  class 
Star-Set2  (C)  of  sets  by 

{B : B c [^’  ^]2’  To  - p{e)  ~\'  9e  [0’ 2r)’  pe°2’ - c}’ 

and  consider  the  corresponding  functional  class 

Star2(C)  = {/  = 1B:  Be  Star-Set2  (£7)} . 

The  following  lower  rate  bound  should  be  compared  with  (3). 

Lemma.  Let  the  polynomial  7 r(-)  be  given.  There  is  a constant  c so  that,  for 
every  dictionary  T>, 

em(Star2(C);  D,  ir)  > c „■■  ■*  , m ->  00. 

log(m) 

This  is  proved  in  [10]  by  the  technique  of  hypercube  embedding.  Inside 
the  class  Star2(C)  one  can  embed  very  high-dimensional  hypercubes,  and  the 
ability  of  a dictionary  to  represent  all  members  of  a hypercube  of  dimension  n 
by  selecting  m -C  rc  terms  from  a subdictionary  of  size  7r(m)  is  highly  limited 
if  7r(m)  grows  only  polynomially. 

For  each  /,  a corresponding  orthobasis  is  adaptively  constructed  in  [13] 
which  achieves  the  rate  (3).  It  tracks  the  boundary  of  B at  increasing  accuracy 
using  a sequence  of  polygons;  in  fact  these  are  n-gons  connecting  equispaced 
points  along  the  boundary  of  B,  for  n = 2L  The  difference  between  n-gons 
for  n = 2J  and  n = 2J+1  is  a collection  of  thin  triangular  regions  obeying 
width  ss  length2;  taking  the  indicators  of  each  region  as  a term  in  a basis, 
one  gets  an  orthonormal  basis  whose  terms  at  fine  scales  are  thin  triangular 
pieces.  Estimating  the  coefficient  sizes  by  simple  geometric  analysis  leads  to 
the  result  (3).  In  fact,  [13]  shows  how  to  do  this  under  the  constraint  of 
polynomial-depth  selection,  with  polynomial  Cm.7. 

Although  space  constraints  prohibit  a full  explanation,  our  polynomial- 
depth  search  formalism  also  makes  perfect  sense  in  discussing  the  warped 
wavelet  representations  of  the  Introduction.  Consider  the  noncountable  ‘dic- 
tionary’ of  all  wavelets  in  a given  basis,  with  all  continuum  warpings  applied. 
Notice  that  for  wavelets  at  a given  fixed  scale,  warpings  can  be  quantized  with 
a certain  finite  accuracy.  Carefully  specifying  the  quantization  of  the  warping, 
one  obtains  a countable  collection  of  warped  wavelets,  for  which  polynomial 
depth  search  constraints  make  sense,  and  which  is  as  effective  as  adaptive 
triangulation,  but  not  more  so.  Hence  (3)  applies  to  (properly  interpreted) 
deformation  methods  as  well. 
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§5.  Curvelet  Construction 

We  now  briefly  describe  the  curvelet  construction.  It  is  based  on  combining 
several  ideas,  which  we  briefly  review: 

• Ridgelets,  a method  of  analysis  suitable  for  objects  with  discontinuities 
across  straight  lines. 

• Multiscale  Ridgelets,  a pyramid  of  windowed  ridgelets,  renormalized  and 
transported  to  a wide  range  of  scales  and  locations. 

• Bandpass  Filtering,  a method  of  separating  an  object  into  a series  of  dis- 
joint scales. 

We  briefly  describe  each  idea  in  turn,  and  then  their  combination. 

Ridgelets 

The  theory  of  ridgelets  was  developed  in  the  Ph.D.  Thesis  of  Emmanuel 
Candes  (1998).  In  that  work,  Candes  showed  that  one  could  develop  a system 
of  analysis  based  on  ridge  functions 

ipa, 6,0 (xi,  *2)  = a~1/2ip((xi  cos (6)  + x2sin(<?)  - b)/a).  (5) 

He  introduced  a continuous  ridgelet  transform  Rj{a,  b,  0)  = (ipa]b,e(x),  f)  with 
a reproducing  formula  and  a Parseval  relation.  He  also  constructed  frames, 
giving  stable  series  expansions  in  terms  of  a special  discrete  collection  of  ridge 
functions.  The  approach  was  general,  and  gave  ridgelet  frames  for  functions 
in  L2\ 0,  l]d  in  all  dimensions  d>  2 - For  further  developments,  see  [3,5]. 

Donoho  [12]  showed  that  in  two  dimensions,  by  heeding  the  sampling  pat- 
tern underlying  the  ridgelet  frame,  one  could  develop  an  orthonormal  set  for 
Z/2(]R2)  having  the  same  applications  as  the  original  ridgelets.  The  orthonor- 
mal ridgelets  are  convenient  to  use  for  the  curvelet  construction,  although  it 
seems  clear  that  the  original  ridgelet  frames  could  also  be  used.  The  ortho- 
ridgelets  are  indexed  using  A = where  j indexes  the  ridge  scale,  k 

the  ridge  location,  i the  angular  scale,  and  l the  angular  location;  e is  a gender 
token.  Roughly  speaking,  the  ortho-ridgelets  look  like  pieces  of  ridgelets  (5) 
which  are  windowed  to  lie  in  discs  of  radius  about  21;  £ = 1/ 2l  is  roughly 

the  orientation  parameter,  and  2_■,  is  roughly  the  thickness. 

A formula  for  ortho-ridgelets  can  be  given  in  the  frequency  domain 

MO  = \0~H^jA\0)wU(0  + hAOOMA6  + *))/2  ■ 

Here  the  -0j,fc  are  Meyer  wavelets  for  IR,  w\  t are  periodic  wavelets  for  [— n,  n), 
and  indices  run  as  follows:  j,  k G TL,  (.  = 0, . . . , 2'-1  — 1;  i > 1,  and,  if  e = 0, 
i = max(l,  j),  while  if  e = 1,  i > max(l,j).  We  let  A be  the  set  of  such  A. 
The  formula  is  an  operationalization  of  the  ridgelet  sampling  principle: 

• Divide  the  frequency  domain  in  dyadic  coronae  |£|  G [2J,  2J+1], 

• In  the  angular  direction,  sample  the  j-th  corona  at  least  2J  times. 

• In  the  radial  frequency  direction,  sample  behavior  using  local  cosines. 


Curvelets 


115 


The  sampling  principle  can  be  motivated  by  the  behavior  of  Fourier  trans- 
forms of  functions  with  singularities  along  lines.  Such  functions  have  Fourier 
transforms  which  decay  slowly  along  associated  lines  through  the  origin  in  the 
frequency  domain.  As  one  traverses  a constant  radius  arc  in  Fourier  space, 
one  encounters  a ‘Fourier  ridge’  when  crossing  the  line  of  slow  decay.  The 
ridgelet  sampling  scheme  tries  to  represent  such  Fourier  transforms  by  using 
wavelets  in  the  angular  direction,  so  that  the  ‘Fourier  ridge’  is  captured  neatly 
by  one  or  a few  wavelets.  In  the  radial  direction,  the  Fourier  ridge  is  actu- 
ally oscillatory,  and  this  is  captured  by  local  cosines.  A precise  quantitative 
treatment  is  given  in  [4]. 

Multiscale  ridgelets 

Think  of  ortho-ridgelets  as  objects  which  have  a “length”  of  about  1 and  a 
“width”  which  can  be  arbitrarily  fine.  The  multiscale  ridgelet  system  renor- 
malizes and  transports  such  objects,  so  that  one  has  a system  of  elements  at 
all  lengths  and  all  finer  widths. 

In  a light  mood,  we  may  describe  the  system  impressionistically  as  “brush 
strokes”  with  a variety  of  lengths,  thicknesses,  orientations  and  locations. 

The  construction  employs  a nonnegative,  smooth  partition  of  energy 
function  w,  obeying  k vj2(xi  - ki,x2  — k2)  = 1.  Define  a transport 
operator,  so  that  with  index  Q indicating  a dyadic  square  Q = (s,  k\,  k2) 
of  the  form  + 1)/2S)  x [k2/2s,{k2  + 1)/2S),  by  (TQf)(xltxa)  = 

f(2sxi  — ki,2sx2  — k2).  The  Multiscale  Ridgelet  with  index  fj,  = ( Q , A)  is  then 


Vv  =2 s -TQ{w-px). 

In  short,  one  transports  the  normalized,  windowed  ortho-ridgelet. 

Letting  Qs  denote  the  dyadic  squares  of  side  2-s,  we  can  define  the 
subcollection  of  Monoscale  Ridgelets  at  scale  s: 

Ms  = {(Q,  A) : Q € Qs,  A € A}. 

Orthonormality  of  the  ridgelets  implies  that  each  system  of  monoscale  ridgelets 
makes  a tight  frame,  in  particular  obeying  the  Parseval  relation 

E to../)2  = 11/11!’- 

neMs 

It  follows  that  the  dictionary  of  multiscale  ridgelets  at  all  scales,  indexed  by 

M = Us>iMs, 

is  not  frameable,  as  we  have  energy  blow-up: 

E (V’m-/)2  = (6) 
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The  Multiscale  Ridgelets  dictionary  is  simply  too  massive  to  form  a good  ana- 
lyzing set.  It  lacks  inter-scale  orthogonality  - is  not  typically  orthogonal 

to  ip(Q\y)  if  Q and  Q'  are  squares  at  different  scales  and  overlapping  loca- 
tions. In  analyzing  a function  using  this  dictionary,  the  repeated  interactions 
with  all  different  scales  causes  energy  blow-up  (6). 

Our  construction  of  curvelets  solves  (6)  by  disallowing  the  full  richness  of 
the  Multiscale  Ridgelets  dictionary.  Instead  of  allowing  ‘brushstrokes’  of  all 
different  ‘lengths’  and  ‘widths’,  we  allow  only  those  where  width  « length 2 . 

Subband  filtering 

Our  solution  to  the  ‘energy  blow-up’  (6)  is  to  decompose  / into  subbands  using 
standard  filterbank  ideas.  Then  we  assign  one  specific  monoscale  dictionary 
Ms  to  analyze  one  specific  (and  specially  chosen)  subband. 

We  define  coronae  of  frequencies  |£|  6 [22s,  22s+2],  and  subband  filters  As 
extracting  components  of  / in  the  indicated  subbands;  a filter  Pq  deals  with 
frequencies  |£|  < 1.  The  filters  decompose  the  energy  exactly  into  subbands: 

ll/lli  = ll^o/|li  + EllA^lli- 

s 

The  construction  of  such  operators  is  standard  [15];  the  coronization  oriented 
around  powers  22s  is  nonstandard  - and  essential  for  us.  Explicitly,  we  build  a 
sequence  of  filters  $o  and  $2s  = 24s$(22s’),  s = 0,1,2,...  with  the  following 
properties:  $o  is  a lowpass  filter  concentrated  near  frequencies  |£|  < 1;  Vt^s  is 
bandpass,  concentrated  near  |£|  € [22s,22s+z];  and  we  have 

l<MOI2  + El*(2~2s£)l2  = 1-  v*. 

s>0 


Hence,  As  is  simply  the  convolution  operator  A sf  = tl^s  * /• 

Definition  of  curvelet  transform 

Assembling  the  above  ingredients,  we  are  able  to  sketch  the  definition  of  the 
Curvelet  transform.  We  let  M'  consist  of  M merged  with  the  collection  of 
integral  pairs  (Aq,^)  indexing  unit-side  squares  in  the  plane. 

The  curvelet  transform  is  a map  L2(1R2)  i— > f2(M'),  yielding  Curvelet 
coefficients  (aM  : g € M').  These  come  in  two  types.  At  coarse  scale  we  have 
wavelet  scaling  function  coefficients 

= (fat, k2,Pof),  H = (h,k2)  € M'\M, 

where  4>k1,k2  is  the  Lemarie  scaling  function  of  the  Meyer  basis,  while  at  fine 
scale  we  have  Multiscale  Ridgelets  coefficients  of  the  bandpass  filtered  object: 

— (Asf,  , fk  £ Ms , s = 1, 2, . . . . 
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Note  well  that  each  coefficient  associated  to  scale  2 s derives  from  the  subband 
filtered  version  of  / - A sf  - and  not  from  /.  Several  properties  are  immediate: 

• Tight  Frame: 

11/111=  £ M2- 

iieM' 

• Existence  of  Coefficient  Representers  (Frame  Elements): 

= (/,7m)- 

• L2  Reconstruction  Formula: 

/=  £ (/>7m)7m- 

H&M' 

• Formula  for  Frame  Elements: 

7m  = P ^ Ss- 

In  short,  the  curvelets  are  obtained  by  bandpass  filtering  of  Multiscale 
Ridgelets  with  passband  is  rigidly  linked  to  the  scale  of  spatial  localization 

• Anisotropy  Scaling  Law:  Linking  the  filter  passband  |£|  « 22s  to  the 

spatial  scale  2-3  imposes  that  (1)  most  curvelets  are  negligible  in  norm 
(most  multiscale  ridgelets  do  not  survive  the  bandpass  filtering  As);  (2) 
the  non-negligible  curvelets  obey  length  « 2_s  while  width  « 2_2s.  So 
the  system  obeys  approximately  the  scaling  relationship 

width  « length2. 

It  is  here  that  the  22s  coronization  scheme  comes  into  play. 

§6.  Why  Should  This  Work? 

The  curvelet  decomposition  can  be  equivalently  stated  in  the  following  form: 

• Subband  Decomposition.  The  object  / is  filtered  into  subbands: 

/^(P0/,A1/,A2/,...). 

• Smooth  Partitioning.  Each  subband  is  smoothly  windowed  into  “squares” 
of  an  appropriate  scale: 


A sf  >->  (wQ&,f)QeQs. 

• Renormalization.  Each  resulting  square  is  renormalized  to  unit  scale 


9q  — 2_s(Tq)_1(wqAs/),  QzQt 
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• Ridgelet  Analysis.  Each  square  is  analyzed  in  the  ortho-ridgelet  system 

= (9Q,P\),  /*  = (Q,A). 

We  can  now  give  a crude  explanation  of  why  the  main  result  (4)  holds. 
Effectively,  the  bandpass  images  A sf  are  almost  vanishing  at  x far  from  the 
edges  in  /.  Along  edges,  the  bandpass  images  exhibit  ridges  of  width  « 2-2s 
- the  width  of  the  underlying  bandpass  filter. 

The  partitioned  bandpass  images  are  broken  into  squares  of  side  2~s  x2-s. 
The  squares  which  do  not  intersect  edges  have  effectively  no  energy,  and  we 
ignore  them.  The  squares  which  do  intersect  edges  result  from  taking  a nearly- 
straight  ridge  and  windowing  it.  Thus  the  squares  which  ‘matter’  exhibit 
tiny  ridge  fragments  of  aspect  ratio  2~s  by  2_2s.  After  renormalization,  the 
resulting  gQ  exhibits  a ridge  fragment  of  about  unit  length  and  of  width  2~s . 
The  ridge  fragment  is  then  analyzed  in  the  ortho-ridgelet  system,  which  should 
(we  hope)  yield  only  a few  significant  coefficients. 

In  fact,  simple  arguments  of  size  and  order  give  an  idea  how  the  curvelet 
coefficients  roughly  behave.  We  give  an  extremely  loose  description. 

First,  at  scale  2~s,  there  are  only  about  0( 2s)  squares  Q € Qs  that 
interact  with  the  edges.  Calculating  the  energy  in  such  a square  using  the  size 
of  the  support  and  the  height  of  the  ridge  leads  to 

(length  • width)1/2  • height  « (2~s  x 2-28)1/2  x 1. 

Indeed,  the  height  of  the  ridge  is  bounded  by 

l|A./||oo  = 11*2.  * /II oo  < ||*2.||l||/||oo  = ||*||l||/||oo. 

Since  we  are  interested  in  uniformly  bounded  functions  /,  the  height  is  thus 
bounded  by  a constant  C.  The  calculation  of  the  norm  \\gq  1 1 2 ~ 2-3/2  follows 
immediately  (because  of  the  renormalization,  the  height  of  the  ridge  gQ  is  now 
2~s)  . Now  temporarily  suppose  that  for  some  fixed  K not  depending  on  Q, 

each  ridge  fragment  gQ  is  a sum  of  at  most  K ortho-ridgelets.  (7) 

This  would  imply  that  at  level  s we  have  a total  number  of  coefficients 

0(2S)  squares  which  ‘matter’  x A coefficients/square, 

while  the  norm  estimate  for  gQ  and  the  orthonormality  of  ridgelets  give 

coefficient  amplitude  < C ■ 2~3s/2. 

The  above  assumptions  imply  that  the  N- th  largest  curvelet  coefficient  is  of 
size  < C ■ 7V~3/2.  Letting  |a|(jv)  denote  the  A-th  coefficient  amplitude,  the 
tail  sum  of  squares  would  obey 

J2  H(AO  <C-m~2. 

N>m 


(8) 
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This  coefficient  decay  leads  to  (4)  as  follows.  Let  pi,... , pm  enumerate 
indices  of  the  m largest  curvelet  coefficients.  Build  the  m-term  approximation 

m 

fm 

2 = 1 

By  the  tight  frame  property, 

oo 

II/-  /mil2  < l«!(JV).<  c-m~2, 

N=m+ 1 

where  the  last  step  used  (8).  This  of  course  would  establish  (4)  - in  fact 
something  even  stronger,  something  fully  as  good  as  (3). 

However,  we  have  temporarily  assumed  (7)  - which  is  not  true.  Each 
ridge  fragment  generates  a countable  number  of  nonzero  ridgelet  coefficients 
in  general.  The  paper  [6]  gets  (4)  using  much  more  subtle  estimates. 

§7.  Discussion 

Why  call  these  things  curvelets? 

The  visual  appearance  of  curvelets  does  not  match  the  name  we  have  given 
them.  The  curvelets  waveforms  look  like  brushstrokes;  brush  lets  would  have 
been  an  appropriate  name,  but  it  was  taken  already,  by  F.  Meyer  and  R. 
Coifman,  for  an  unrelated  scheme  (essentially,  Gabor  Analysis). 

Our  point  of  view  is  simply  that  curvelets  exemplify  a certain  curve  scaling 
law,  width  = length2,  which  is  naturally  associated  to  curves. 

A deeper  connection  between  curves  and  curvelets  was  alluded  to  in  our 
talk  at  Curves  and  Surfaces  ’99.  Think  of  a curve  in  the  plane  as  a distribu- 
tion supported  on  the  curve,  in  the  same  way  that  a point  in  the  plane  can 
be  thought  of  as  a Dirac  distribution  supported  on  that  point.  The  curvelets 
scheme  can  be  used  to  represent  that  distribution  as  a superposition  of  func- 
tions of  various  lengths  and  widths  obeying  the  scaling  law  width  = length2. 
In  a certain  sense  this  is  a near-optimal  representation  of  the  distribution. 

The  analogy  and  the  surprise 

Sections  2 and  3 showed  that  wavelets  do  surprisingly  well  in  representing 
point  singularities.  Without  attempting  an  explicit  representation  of  ‘where 
the  bad  points  are’,  wavelets  do  essentially  as  well  as  ideal  adaptive  schemes 
in  representing  the  point  singularities. 

Sections  4-6  showed  that  the  non-adaptive  curvelets  representation  can 
do  nearly  as  well  in  representing  objects  with  discontinuities  along  curves  as 
adaptive  methods  that  explicitly  track  the  shape  of  the  discontinuity  and  use 
a special  adaptive  representation  dependent  on  that  tracking. 

We  find  it  surprising  and  stimulating  that  the  curvelet  representation 
can  work  so  well  despite  the  fact  that  it  never  constructs  or  depends  on  the 
existence  of  any  ‘map’  of  the  discontinuity  set. 

We  also  find  it  interesting  that  there  is  a system  of  analysis  which  plays 
the  role  for  curvilinear  singularities  that  wavelets  play  for  point  singularities. 


120 


E.  J.  Candes  and  D.  L.  Donoho 


Acknowledgments.  This  research  was  supported  by  National  Science  Foun- 
dation grants  DMS  98-72890  (KDI),  DMS  95-05151,  and  by  AFOSR  MURI 
95-P49620-96-1-0028. 


References 

1.  Candes,  E.  J.,  Harmonic  analysis  of  neural  networks,  Appl.  Comput.  Har- 
mon. Anal.  6 (1999),  197-218. 

2.  Candes,  E.  J.,  Ridgelets:  theory  and  applications,  Ph.D.  Thesis,  Statistics, 
Stanford,  1998. 

3.  Candes,  E.  J.,  Monoscale  ridgelets  for  the  representation  of  images  with 
edges,  Technical  Report,  Statistics,  Stanford,  1999. 

4.  Candes,  E.  J.,  On  the  representation  of  mutilated  Sobolev  functions,  Tech- 
nical Report,  Statistics,  Stanford,  1999. 

5.  Candes,  E.  J.,  and  D.  L.  Donoho,  Ridgelets:  The  key  to  high-dimensional 
intermittency?  Phil.  Trans.  R.  Soc.  Lond.  A.  357  (1999),  2495-2509. 

6.  Candes,  E.  J.,  and  D.  L.  Donoho,  Curvelets,  Manuscript,  1999. 

7.  Coifman,  R.  R.,  and  M.  V.  Wickerhauser,  Entropy-based  algorithms  for 
best  basis  selection,  IEEE  Trans.  Inform.  Theory  38  (1992),  1713-1716. 

8.  Deng,  B.,  B.  Jawerth,  G.  Peters,  and  W.  Sweldens,  Wavelet  probing  for 
compression-based  segmentation,  in  Proc.  SPIE  Symp.  Math.  Imaging: 
Wavelet  Applications  in  Signal  and  Image  Processing,  1993.  Proceedings 
of  SPIE  conference  July  1993,  San  Diego. 

9.  Donoho,  D.  L.,  Minimum  entropy  segmentation,  in  Wavelets:  Theory, 
Algorithms  and  Applications,  C.  K.  Chui,  L.  Montefusco  and  L.  Puccio 
(eds.),  Academic  Press,  San  Diego,  1994,  233-270. 

10.  Donoho,  D.  L.,  and  I.  M.  Johnstone,  Empirical  Atomic  Decomposition, 
Manuscript,  1995. 

11.  Donoho,  D.  L.,  Wedgelets:  nearly  minimax  estimation  of  edges,  Ann. 
Statist.  27  (1999),  859-897. 

12.  Donoho,  D.  L.,  Orthonormal  ridgelets  and  linear  singularities,  Technical 
Report,  Statistics,  Stanford,  1998,  SIAM  J.  Math.  Anal.,  to  appear. 

13.  Donoho,  D.  L.,  Sparse  components  analysis  and  optimal  atomic  decom- 
position, Technical  Report,  Statistics,  Stanford,  1998. 

14.  Meyer,  Y.,  Wavelets  and  Operators,  Cambridge  University  Press,  1992. 

15.  Vetterli,  M.,  and  J.  Kovacevic,  Wavelets  and  Subband  Coding,  Prentice 
Hall,  1995. 

Department  of  Statistics 

Stanford  University 

Stanford,  CA  94305-4065,  USA 

{emmanuel , donoho}@stat . Stanford . edu 

http : //www-stat . Stanford . edu/{~emmanuel , ^donoho}/ 


